Àá½Ã¸¸ ±â´Ù·Á ÁÖ¼¼¿ä. ·ÎµùÁßÀÔ´Ï´Ù.
KMID : 0381120140360020191
Genes and Genomics
2014 Volume.36 No. 2 p.191 ~ p.196
A clustering method for next-generation sequences of bacterial genomes through multiomics data mapping
Seok Ho-Sik

Sim Mi-Kang
Lee Dae-Hwan
Kim Jae-Bum
Abstract
With various ¡®omics¡¯ data becoming available recently, new challenges and opportunities are provided for researches on the assembly of next-generation sequences. As an attempt to utilize novel opportunities, we developed a next-generation sequence clustering method focusing on interdependency between genomics and proteomics data. Under the assumption that we can obtain next-generation read sequences and proteomics data of a target species, we mapped the read sequences against protein sequences and found physically adjacent reads based on a machine learning-based read assignment method. We measured the performance of our method by using simulated read sequences and collected protein sequences of Escherichia coli (E. coli). Here, we concentrated on the actual adjacency of the clustered reads in the E. coli genome and found that (i) the proposed method improves the performance of read clustering and (ii) the use of proteomics data does have a potential for enhancing the performance of genome assemblers. These results demonstrate that the integrative approach is effective for the accurate grouping of adjacent reads in a genome, which will result in a better genome assembly.
KEYWORD
Next-generation sequence assembly, Escherichia coli, Multiomics data, Read clustering
FullTexts / Linksout information
Listed journal information
SCI(E) ÇмúÁøÈïÀç´Ü(KCI)